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Abstract 

This paper provides a duality gap convergence analysis for the standard ADMM as well as 
a linearized version of ADMM. It is shown that under appropriate conditions, both methods 
achieve linear convergence. However, the standard ADMM achieves a faster accelerated conver¬ 
gence rate than that of the linearized ADMM. A simple numerical example is used to illustrate 
the difference in convergence behavior. 


1 Introduction 


This paper considers the following optimization problem: 

min [(piw) + g{v)] 

W,V 

subject to Aw — Bv = c, 


( 1 ) 


where {w, v) G M” X M"* are unknown vectors, A G B G and c G are known matrices 

and vector. In this paper, we assume that (p : M” —MU{-|-oo} and g : —>■ MU{-|-oo} are convex 

functions. 

A popular method for solving ([T]) is the Alternating Direction Method of Multipliers (ADMM) 
algorithm. It solves the problem by alternatively optimizing the variables in the Augmented La- 
grangian function: 

£{w, V, a, p) = 4>{w) -b g{v) + {Aw — Bv — c) -|- ^|| Arc — Bv — c|||, (2) 

and the resulting procedure is summarized in Algorithm [TJ In the algorithm, both G and H are 
symmetric positive semi-definite matrices. In the standard ADMM, we can set G = 0 and H = 0. 
The method of introducing the additional term ||u — = {v — G{v — v^~^) is often 

referred to as preconditioning. If we let G = j5I — B^B for a sufficiently large /3 > 0 such that 
G is positive semi-definite, then the minimization problem to obtain u* in line 3 of Algorithm [1] 
becomes: 

U = argmin g{v) — -b pB"^Aw^~^ -b pGv^~^y^v + , 

[ 2 J 

which may be simpler to solve than the corresponding problem with G = 0, since the original 
quadratic term v~^B~^Bv is now replaced by v'^v. The additional term ||u; — w^~^\\‘fj can play a 
similar role of preconditioning. 


1 




Algorithm 1 Preconditioned Standard ADMM Algorithm 
1: Choose w^, v^, and 

2: for t = 1,2,... do 

3: = argmin^[ 5 ((z;) — Bv + ^\\Aw^~^ — Bv — cHl + ^\\v — ^’*“^11^]; 

4: = argmin^„[0(u;) + Aw + f HAw — Bv^ — cH^ + ^\\w — 

5: a* = + p{Aw^ — Bv^ — c); 

6; end for 

7: Output: w^, v^, ah 


For simplicity, this paper focuses on the scenario that g{-) is strongly convex, and (/>(•) is smooth. 
The results allow g{-) to include a constraint u G fl for a convex set fl by setting g{v) = +oo when 
V ^ Vl. The same proof technique can also handle other three cases with one objective function 
being smooth and one being strongly convex. 

The standard ADMM algorithm assumes that the optimization problem to obtain is sim¬ 
ple. If this optimization is difficult to perform, then we may also consider the linearized ADMM 
formulation which replaces by a quadratic approximation (pn^w) defined as 

(Ph{w^~^]w) = (p{w^~^) + V(p{w^~^)^ {w — w^~^) + ^{w — H {w — 

The resulting algorithm is described in Algorithm [2l Both H and G are symmetric positive semi- 
definite matrices. By setting H = f3'I — pA~^A, we can replace the term w'^A~^Aw by w~^w in the 
optimization of line 4 of Algorithm [ 2 j 


Algorithm 2 Preconditioned Linearized ADMM Algorithm 
1: Choose w^, v^, and a° 

2; for t = 1,2,... do 

3: = argmint,[ 5 (u) — a^~^~’'Bv + ^\\Aw*'~^ — Bv — c||| + ^\\v — 

4: w^ = aTgmin^[4>H{w^~^;w) + a‘'~^ Aw + ^\\Aw — Bv^ — c\\l]; 

5: a* = a*“^ -I- p{Aw^ — Bv^ — c); 

6: eud for 

7; Output: rc*, v^, ah 


This paper compares the convergence behavior of the ADMM algorithm versus that of the 
linearized ADMM algorithm for solving ([1]). Under the assumption that A is invertible, g{-) is A 
strongly convex, and (j){-) is I /7 smooth, it is shown that the standard ADMM achieves a worst case 
linear convergence rate of 1/(1 -|- Q{y/Xj)) (with optimally chosen p) while the linearized ADMM 
achieves a slower worst case linear convergence rate of 1/(1 -|- ©(Ay)). 

The paper is organized as follows. Section [2] reviews related work. Section [3] provides a the¬ 
oretical analysis for both standard and linearized ADMM. Section 0] provides a simple numerical 
example to illustrate the difference in convergence behavior. Concluding remarks are given in 
Section [5l 
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2 Related Work on ADMM and Linearized ADMM 


In this section, we review some previous work on the convergence analysis of ADMM and Linearized 
ADMM, focusing mainly on linear convergence results. 

2.1 Results for ADMM 

Many authors have studied the linear convergence of ADMM in recent years. For example, the 
authors in [6] presented a novel proof for the linear convergence of the ADMM algorithm. Moreover, 
the analysis applies for the more general case in which the object function can be the summation 
of more than two separable functions {4> and g in our case). However, the assumption on each 
separable function is very complex, and no explicit rate is obtained. Therefore their results are not 
directly comparable to ours. 

Another work is [4], which presented analysis for the linear convergence of generalized ADMM 
under certain conditions. More comprehensive results for the general form of constraint Aw—Bv = c 
were obtained later in [3] using similar ideas. In that paper, they presented an extension of ADMM 
algorithm called Relaxed ADMM, which leads to linear convergence in the following four cases (it 
also requires either A oi B are invertible): (j) is strongly convex and smooth; g is strongly convex, 
and smooth; (j) is smooth, and g is strongly convex; g is smooth, and 4> is strongly convex. However, 
their analysis employs a technique for analyzing the dual objective of ADMM that may be regarded 
as a Relaxed Peacheman-Rachford splitting method. It can be used to prove the dual convergence. 
In contrast, our analysis uses a very different argument that can directly bound the convergence 
of primal objective function and the duality gap. Moreover, even when the required regularity 
conditions for linear convergence are not satisfied, our analysis immediately implies a sublinear 1/t 
convergence of duality gap (assuming a finite solution exists for the underlying problem). Therefore 
the analysis of this paper contains a unified treatment that can simultaneously handle both linear 
and sublinear convergence depending on the regularity condition. In contrast, although sublinear 
results can be obtained using techniques similar to those of [3] (see results in m), they require 
specialized treatment and the obtained results are in different forms that are not compatible with 
the duality gap convergence of this paper. In this setting, the operator splitting proof techniques 
of [3l[2] and the objective function proof technique of this paper are complementary to each other. 
Another advantage of our proof technique is that it can be directly applied to linearized ADMM 
with minimal modifications. 

Our analysis employs a technique similar to that of m (note that neither linear convergence 
nor duality gap convergence was studied in [TO]). At the conceptual level, the technique is also 
closely related to the analysis of [T], but the actual execution differs quite significantly. One may 
view the analysis of this paper as a refined version of those in |10] , in that we simultaneously handle 
linear and sublinear cases depending on regularity conditions. Moreover, our analysis unifies the 
techniques used in [10] (which deals with primal objective convergence) and the techniques used 
in [T] (which deals with a special primal-dual objective convergence); our proof shows that the 
seemingly different results in these two papers can be proved using the same underlying argument. 
Although results similar to ours were presented in [T] for a procedure related to a specific form of 
preconditioned ADMM (see [T] for discussions), they did not analyze the standard ADMM (or its 
linearized version) under the general condition Aw — Bv = c. Therefore results obtained in this 
paper for ADMM are different from those of [T]. 

Another result on the linear convergence of the standard ADMM can be found in a recent 
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paper [9], which uses a different technique than what’s presented in this paper and that of mm- 
Their results are not directly comparable to ours. Moreover, some other work on the convergence 
of ADMM like procedures include mum. which focused on different applications that are not 
related to our work. 


2.2 Results for Linearized ADMM 


One advantage of our proof technique is that it also handles linearized ADMM, with new results not 


available in the previous literature. Most of previous work on linearized ADMM does not consider 
linear convergence; a few that do consider impose strong assumptions on the matrices A, B, or the 
functions /, g. 

There are several papers that considered linear convergence of Linearized ADMM. For example 
[ 6 ] considered linearized ADMM, but as mentioned earlier, their rate is not explicit and they impose 
complex conditions that are incompatible with our results. Similarly, a linear convergence result 
for linearized ADMM was also obtained in [ 8 ], but only under the assumption of s' = 0 and some 
strong constraints on the matrices A and B. Again their results are incompatible with ours. 

Some other work considered Linearized ADMM in the general cases but without linear conver¬ 
gence. For example, in the authors consider the convergence of Linearized ADMM on several 
different cases, and obtained sublinear convergence of 1/t. Similar sublinear results can be found in 
m for stochastic ADMM. As we have pointed out, our proof technique is closely related to that of 
m, which can handle both linearized and standard ADMM under the same theoretical framework. 


3 Main Results 


This section provides our main results for the standard ADMM and the linearized ADMM. We will 
derive upper bounds on their convergence rates, as well as the worst case matching lower bounds 
for some specific problems. 


3.1 Notations 


Given any convex function h, we may define its convex conjugate 


h*{(3) = sup[/3''"u — h{u)], 


U 


and define the Bregman divergence of a convex function h{u) as: 

Dfi{u',u) = h{u) — h{u') — Vh{u')~^ {u — u'). 


We will assume that ^ is I /7 smooth: 




which also implies that 


D^{w',w) > |||V0(u;') - V(l){w)\\l, 



2 


4 


We also assume that 5 is A strongly convex: 

Vu,u', Dg{v\v)>'^\\v'-v\\l. 

Assume also that is an optimal solution of ([T]), which satishes the equality: 

Ate* —-Bu* — c = 0, =—V(j){w*), re* = V0*(— B^a^ = Vg{v^). (3) 

Given any ce, taking inf over (rc, v) with respect to the Lagrangian 

(t){w) + g{v) + ot^ {Aw — Bv — c), 


we obtain the dual 

D(a) = —(f)*(—A~’~a) — g*(B~’~a) — a~’^c. 

It is clear by definition that for any pair (w, v) that are feasible (that is Aw — Bv — c = 0), and any 
a, we have (/>(w) + g(v) > D(a). The value (/>(w) + g(v) — D{a) is referred to as the duality gap. 
Duality gap is always larger than primal suboptimality [4>{w) + ( 7 (u)] — [4>{w^,) + 5 '(u*)]. Therefore 
if the duality gap is zero, then {w, v) solves ©• 

We may also introduce the concept of restricted duality gap as in [T]. Consider regions Bi C 
and B 2 C M™. Given any d, v, we can define the restricted duality gap 




sup 

aeB\\veB2 


<!)*{—AJ a) + g{v) — ({)*{—AJ a) — g{v) + oi^ {Bv + c) — {Bv + c) 


If we pick {a,v) = then 

D^*{—A~^a^, —A~^ a)+Dg{v^,v) = 4>*{—A~^ a)+g{v)—(p*{—A^ a^)—g{v^:)+6J {Bv^.+c)—aJ {Bv+c). 
Therefore as long as G Bi x B 2 , we have 

D^*{-A~^a^,-A~^a) + Dg{v^,v) < GbixB 2 {^,v). 

Assume AA~^ is invertible, and let 

A+ = (4) 

be the pseudo-inverse of A, then we may let w = A~^{Bv + c). It follows that Aw — Bv — c = 0. If 
we set Bi X B 2 = W X M”*, then we recover the unrestricted duality gap: 


^RpxK-(d,'0) = [(/>(tc) - D{a), 

where the maximum over {a,v) is taken at —AJa = \7(j){w) and v = \7g*{B~^a). 


3.2 Standard ADMM 

In general, we have the following result. 
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Theorem 3.1 Assume that 4> is I /7 smooth and g is X strongly convex. Assume that we can 
write H = AJHA. Let and iTmax(-ff) the largest eigenvalues of H and H respectively, 

f^miniA) he he the smallest eigenvalue value of {AA~^Y/‘^, a ma^ (B) he the largest singular value of 
B, o'max(G) he the largest singular value of G. Consider s G [0,1) and 9 > 0 such that 


9 < min 


Wrria.AH) + 1 ’ 


sp 


(1 - s)A 


O'max(-f^) {p 'L 0’jna,x(^))'^max(^) “1“ (1 s)p(Tiuax(G*) 


Let of = + HA{w^ — ^). Then for all (a,v) and w = V4>*{—A~^a), AlgorithmUl produces 

approximate solutions that satisfy 


+ <{i + e)-^6o-5T, 

t=l 

T 

Y^{l + 9Y-^r: <{1 + 9)-^6o-6t, 


t=i 


(5) 

( 6 ) 


where 


rt 


r 


* 

t 


St 


=4>{w^) + g{v^) — (j){w) — g{v) — a^~^(Aw — Bv — c) + {Aw^ — Bv^ — c), 
=4>*{—A^ of) + g{v^) — ())*{—A~^ a) — g{v) + cf~^ {Bv + c) — of^ {Bv^ + c), 


= ^\\Aw^ - Bv- c\\l + ^\\Aw^ - Bv- c\\f + ^ \\v^ 


v\\g + 


1 + 0 
2p 


a — a 


t\\2 

\\ 2 - 


For arbitrary (a, v), the left hand side of Q and Q can be difficult to understand. We may choose 
specihc values of (a, v) so that the results are easier to interpret. By setting (a, w, v) = (a*, u*) 

in Theorem 13.11 and using ([3]), we obtain the following corollary. 

Corollary 3.1 Under the conditions of Theorem \3.1l we have 


T 


E('+«) 


t-T 


nia.x{D^{w^:, w^), D^* {—AJa^,, —AJ of)) + Dg{v^,v^) 


+ - 'W*)\\l + ^\\A{w'^ - W: ^"2 


1 + 0 , 


2p 


T 

a —a 


2 I 0(1 + ^)|L.T ||2 

* 112 “^ - 11 ^ ~ ^*\\G 


< 


(1 + 0 ) 


-T 


p\\A{w^ - w*)||i + ||^(r(;° - 


^ + —^ll«° - a*||2 + p(l + 0)||u° - u*||c 


'H 


P 


Using the definition of restricted duality gap, it is easy to see that ([6]) directly implies an upper 
bound of restricted duality gap, which is the same style as results of [T]. Our result is more general 
than those of [I] because the results can also be expressed in the form of Corollary 13.11 as well as 
in terms of unrestricted duality gap, as stated below. 
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Corollary 3.2 Under the conditions of Theorem \3.1l and let he the psudo-inverse of A. Define 


(5? = {p + ami,AH))\\A{w^ -'«^*)ll2 + ^-^\\oP -a*||i + /3(1 + 6l)||'(;° -'^*11^ 

2 ||2 ^ 

I 2 : ll'W — 'W*II 2 < 


6i((5) = sup I llo*^ + ^4 (j){A'^{Bu + c) 

u [ 


pil + 0) 

O'yaax.i^H') \ 


'2{2 + e)5 


h 2 { 6 ) = sup I ||u 0 - V 9 *{B^/ 3)\\1 : ||/3 - ^^ 

H^) =-^—h{d) + ^-^crmax(G)62(5) + (p + o^i,^{H)){o^^y:{Bf h2{5) + - Bv^ - CII 2 ). 

2p 2/p 

Then we have the following bound in duality gap 

[(l){A'^{Bv^ + c)) + g{v^)] - D{a^) < (1 + 

Moreover, define 


-T^Ei=l(l + 0 )V 




EUii^ + oy 


Then 


Ei=ii^ + oy 


[y{A+{Bv'^ + c)) + g{v'^)] - D{a^) < . 


In the above results, we consider the simple case of = 0. Then the optimal value of 6 is 
achieved when we take 


P = 


\/ O'max(-B)^ + (Tniax(G) [x ! ^ 

- - 77 T-W-, 0 = (7min(^) V (o-max(-B)^ + Crmax(G'))7-^- 

O-min(^) V 7 ^ 


When 0 > 0, this implies the following convergence from Corollary 13.II 


m.ayi[D^{w^,w ^), D^*{—A a*, —A d^')] + Dg{v^,v ^) 

+ - w*)\\l + YpW^^ ~ "*ll2 + f 11^^ - 


^ I 1 H“ ^min '^max {Bf + Or^UG))l\ 


l-T 


^\\A{w^ - w^jWl + ^||a° - 0 * 11 ^ + |||u° - u. 


2 

*IIG 


This implies ||ui*-'u ;'^||2 = O((l+0)“^), ||u*-u ^||2 = O((l+0)“'^), and ||a*-a '^||2 = 0((l+6')“^). 

The linear convergence result holds when 0 > 0. However, even when 0 = 0 (and H 7 ^ 0), we 
can still obtain the following sublinear convergence from Corollary 13.II 


max 


1 

uF), — ^ {-A^a^,-A~^ay 


t=i 


+ Dg{v^.,v^ ) 


<- 


2T 


1 , 


(P + Orai>^{H))\\A{w'^ - wFll + -Il«° - «*ll2 + “ ^^*111 

P 
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where VlP' = T ^ Y^=i w^-, = T ^ Y^=i This result does not require any assumption on 4>, 

g, A, B. 

Similar results hold for unrestricted duality gap under the conditions of Corollary 13.21 For 
example, when 0 = 0, but AA^ is invertible, we obtain the snblinear convergence of duality-gap 
below. 

[(j){A'^{Bv'^ + c)) g{v'^)] - D{a^) < 

This bound can be compared to the main result of [T] stated in terms of the restricted duality gap 
(in which the authors studied a method that is related to, but not identical to ADMM). Their result 
did not imply a bound on the unrestricted duality gap because they did not obtain a counterpart 
of Corollary 13.11 

In the case of (j) being smooth but g is not a strongly convex fnnction, given any e > 0, we can 
set A = e, and apply ADMM with g{v) replaced by the strongly convex function g{v) + Xv^v. With 
p chosen optimally, this leads to 

(l){A'^{Bv^ + c)) -b g{v^) — D{a^) = 0(e) 
when we take T = ln(l/(j€))/y/^. 


3.3 Linearized ADMM 


For Linearized ADMM, we have the following counterpart of Theorem 13.11 Here we need to assume 
that A is invertible and H is sufficiently large so that aram{H) > 7 ~^. 

Theorem 3.2 Assume that cj) is I /7 smooth and g is X strongly convex, and A is a square invert¬ 
ible matrix. Assume that we can write H = AJHA. Let o,nd crj^axiH) be the smallest 

eigenvalue of H and the largest eigenvalue of H respectively, and we assume that a m\A H) > 7 “^. 
Let (T min fA) be the smallest eigenvalue value of o'max(.B) be the largest singular value of 

B, crmax(G) be the largest singular value of G. Consider s G [0,1) and 9 > 0 such that 


a ^ ■ ( 0o-min(A) 

9 < mm ' 


sp 


(1 - s)A 


y '^min (H) ’ 

'^max (HYipL- <^max m <^max {BY + il-s)p ^max iG)J’ 

Let of = HA{w^ — Then for any {a,v), and w = V(f*{—A~^a), Algorithmic produces 

approximate solutions that satisfy 

Eo+o) 


t-T 


n <(1 + 9) ^60 - St, 


t=i 

T 


J2{l + 9y-^r; <{1 + 9 )-^So-St, 


(7) 

( 8 ) 


t=i 


where 


rt =4>{w* -b g{v*) — 4>{w) — g{v) — of^{Aw — Bv — c) -b a~^{Aw^ ^ — Bv* — c) 
r^ =(/)*{—A~^a^) -b g{v^) — (l)*{—A'^a) — g{v) + {Bv + c) — a~^{Bv^ + c), 

1 


St=-. 


\\Aw^ — Bv — c\\Y -b p\\Aw^ — Bv — c ||2 + p{l + 1 


\v‘-vfa + H 


of — a\\\ 


P 









Similar to the corollaries of Theorem 13.11 we have the following three corollaries of Theorem 13.21 
Corollary 3.3 Under the conditions of Theorem \3.21 we have 

T 




msix{D^{w^,w^ ^),D^*{-A^a^,-A^df)) + Dg{v^,v^) 


t=i 


1 


T ||2 , P W A / T \ii91,fT 


+ 2ll^^ ““'“^*)ll2+ 2p 


\ T I|2 I pi^ + ^) II T I|2 

\a — 0*112 H-r-ll'U 


< 


(1 + 0 ) 


-T r 


{p + a,riaK{H))\\A{w^+ 

p 


l« “ CH*ll2 + p(l + 0)lb^ “ ^*IIg 


Corollary 3.4 Under the conditions of Theorem \3.R and let A'^ be the psudo-inverse of A. If we 
define 6^, b{6), and as in Corollary \3f^ then 

[(j){A^{Bv'^ + c)) + g{v^)] - D{a^) <(1 + e)~^b{{l + 6»)"'^5°), 

[4>{A'^{Bv'^ + c)) + g{v'^)] — D{6F) <—-. 

ELi(i + 0 )* 

The requirement of ara\n{H) + is the key difference between Theorem 13.II and Theorem l3.21 
The fast convergence of ADMM requires that H to be of order 0{p), which may be smaller than 
0 ( 7 “^). Consider the case that H = 0 ( 7 “^/) for linearized ADMM, then the optimal p can be 
chosen as p = 0 ( 7 “^). This leads to a linear convergence with 6 = 0 (A 7 ). The rate is slower than 
that of the standard ADMM, which can achieve 0 = 0(\/A7) at the optimal choice of p. 

Similar to the case of standard ADMM, we could take 0 = 0; as long f is I /7 smooth, and 
H satisfies > 2 / 7 , we can achieve the following sublinear convergence without additional 

assumptions: 


1 

^Taax{D^{w^,w^~^),Dii,*{-A~^a^,-AJa^)) + Dg{v^,v^) 


<- 


t=i 

1 


{p + CrraaAH))\\A{w'^ - W^)\\i + - ^^*||g + 0 ll« “ «*ll2 


A similar result holds for duality gap convergence when A is a square invertible matrix. 


3.4 Lower Bounds 

We consider the quadratic case that A = B = I, c = 0, and 


The optimal solution is 


1 1 

(P{w) = -w~''Qw, g{v) = -v'^ kv. 


rc* = u* = a* = 0 . 


We show that with appropriately chosen Q and A so that Q is I /7 smooth, and both A and Q are 
A strongly convex, the convergence rate of ADMM can be 1 — 0(-v+)+) and the convergence rate of 
linearized ADMM can be 1 — Q{'yX). 
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ADMM 

We assume that Q and A are diagonal matrices. 

The ADMM iterate satisfies the following equations (with G = 0): 

u* =(A + + pw^~^) 

={Q + pl)~^{pv^ - 
a* + p{w^ — u*), 


which implies 


u* ={A + pl)-\a^-^ + pw^-^) 

=(A + pI)~^{Q + pl)~^{p‘^w*~^ - Aa*“^) 
a* =(A + piyHQ + pI)-^Q{Aa^-^ - p^w*-^). 

We may write [w^] a*] = M[w^~^;a^~^]. Now we take Q = A = diag(A, I/ 7 ), where we assume 
that A < 1 / 7 . Then the largest eigenvalue of M, which determines the rate of convergence of 
ADMM, is 

p^ + A^ /9^7^ + 1 
- -TTo 5 1 -TTo 

[{p + xy (p7 + i) 2 j 

The optimal p to minimize the above is p = vA/ 7 , and the maximum value is (1+ 7 A)/(I + 

This special case matches the convergence rate behavior of 1 — Q{y/^) we proved for the ADMM 
method. 

Linearized ADMM 

We assume that H, Q, and A are diagonal matrices. The linearized ADMM iterate satisfies the 
following equations (with G = 0 ): 

=(A + pl)~^{a^~^ + pw*~^) 

={H + pI)~^{{H — Q)w^~^ + pv^ — a^~^) 

=a^~^ + p{w^ — v^), 

which implies that 

V* =(A + pl)~^{a*~^ + pw^~^) 

=(A + pI)-\H + piyHiipI + A){H -Q) + p^I)w^-^ - Aa*-i) 

=(A + pI)-\H + pI)-^H{Aa^-^ + p{A - {pi + A)H-^Q)w^-^). 

Now let A < 1 / 7 , and we take Q = A = diag(A, I/ 7 ), and H = diag( 2 / 7 , 2 / 7 ). It follows that the 
convergence rate of linearized ADMM is no faster than the largest eigenvalue of 

p^ + {p +X){h-q) -A 
p{Xh — {p + X)q) Xh 


M = 


1 


{p + h){p + A) 
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with q = \ and h = When p < h — X, the largest eigenvalue of M is no less than 

+ {h- q)p+{h - q)X ^ h-X ^ _ 

+ (/i + X)p + hX ~ h + X 

Similarly, it is also not difficult to check that the eigenvalue is no less than 1 —©(Ay) when p > h—X. 
It follows that this special case matches the convergence rate behavior of 1 — ©(yA) we proved for 
the linearized ADMM method. 

4 Numerical Illustration 

Although we have obtained both the worst case upper bounds and matching lower bounds for 
ADMM and Linearized ADMM. The analysis shows that in the worst case ADMM converges at a 
faster rate of 1 — 0(V^p) while in the worst case Linearized ADMM converges at a slower rate of 
1 - 0(Ap). 

However, for any specific problem, both methods can converge faster than the corresponding 
worst case upper bounds obtained in this paper. In this section, we use a simple example to 
illustrate the real convergence behavior of ADMM versus linearized ADMM methods at different 
choices of p’s, to illustrate the phenomenon that the former can converge significantly faster than 
the latter. 

Consider the following 1-dimensional problem: 

,,, W ,W^ 1,/, ^ /N I 4 An 

(/> u;) = — arctan —) - - In 1 -h —) + -w% g{v) = —v^ -v\ 

V 7 2 72 12 2 

with A = B = I and c = 0. It can be checked that is I /7 -|- p smooth and p strongly convex; 
g{v) is A-strongly convex. 

We compare the convergence of ADMM versus linearized ADMM with different values of p. In 
linearized ADMM, and we set h = 2(p -|- I/ 7 ). Note that for this problem, tc* = u* = 0, and we 
can define the error of a solution {w, v) as y/w'^ + v'^. 

Figured] shows the convergence behavior when 7 = 0.1, and A = p = 0.2. This is the situation 
that A 7 = 0.02 is relatively small. In this case, we compare three different values of p’s: p = 
0.2y^A/ 7 , p = A/ 7 , and p = 5 yA/ 7 . The corresponding convergence rates for ADMM are 0.51, 

0.21, and 0.41; the corresponding convergence rates for linearized ADMM are 0.51, 0.53, and 0.64. 
This shows that ADMM is superior to Linearized ADMM for p’s. Moreover, it achieves relatively 
fast convergence rate at the optimal choice of p = A/ 7 , while Linearized ADMM is relatively 

insensitive to p. 

Figure [2] shows the convergence behavior when 7 = A = p = 1. This is the situation that 
A 7 = 1 is relatively large. We compare three different values of p’s: p = 0 . 2 y^A/ 7 , p = y^A/ 7 , 
and p = 5-y/A/ 7 , the corresponding convergence rates for ADMM are 0.78, 0.49, and 0.64; the 
corresponding convergence rates for linearized ADMM are 0.82, 0.69, and 0.82. The relatively 
convergence behaviors of ADMM and linearized ADMM are consistent with those of Figured! 

5 Conclusion 

This paper presents a new duality gap convergence analysis of standard ADMM versus linearized 
ADMM under conditions commonly studied in the literature. It is shown that in the worst case. 
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Figure 1: Convergence of ADMM versus that of Linearized ADMM (7 = 0.1,A = /r = 0.2) 





Figure 2: Convergence of ADMM versus that of Linearized ADMM (7 = A = /r = 1) 


the standard ADMM converges with an accelerated rate that is faster than that of the linearized 
ADMM. Matching lower bounds are obtained for specific problems. A simple numerical example 
illustrates this behavior. One consequence of our analysis is that the standard ADMM does not 
require Nesterov’s acceleration scheme in theory because it already enjoys the squared root con¬ 
vergence rate for smooth-strongly convex problems. On the other hand, linearized ADMM may 
still benefit from extra acceleration steps. Finally the results obtained in this paper only show the 
worst case behaviors for both algorithms (under appropriate assumptions commonly used in the 
literature). In practice, both methods might converge faster, and it remains open to study such 
faster convergence rates under additional suitable assumptions. 
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A Proof of Theorem 13.1 

The fact that uf minimizes the objective function in line 4 of Algorithm [U together with the 

relationship of a* and in line 5, implies that 

- w^). (9) 
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We thus obtain 


(j){w^) — </>(ty) + A{w^ — w) + {Aw — Bv — c)~^HA{w^ ^ — w^) 
<_2||V0(«;*)-V</>H||i 


+ V(p{w^) {w^ — w) + (a*) A{w^ — w) + {Aw — Bv — c) HA{w^ — w^) 


= -^\\A'^{a^-a)+H{w^-w ,\\2 


+ {w^-^ -w^)' A' HA{w^ -w) + {Aw -Bv-c)' HA{w^-^ - w^) 

II /iT I U(n,-X 1\||2 


= -^\\A~^{a^-a)+H{w^-w ;||2 


+ 


1 

2 L 


\\Aw^ ^ — Bv — c||^ — ||y4u;* — Bv — c||^ — \\Aw^ — Aw 


,.i-l||2 


( 10 ) 


In the above derivation, the inequality is a direct consequence of the smoothness of cf), which implies 
that for any re'and tc, (l){w) > t(;')+0.57||V(/)(rt;)—V(()(u;')|||. The first equality 

is due to Q, and \7cj){w) + A~^a = 0 (which follows from the assumption w = S/ (j)*{— A^a) of the 
theorem). The second equality is algebra. 

We also have from the optimality of u* for minimizing the objective function in line 3 of Algo¬ 
rithm [U and the relationship of a*’ and in line 5: 


Vff(u*) - B'^a^ = -pG{v* - v*-^) + pB^A{w^-^ - w*). 


( 11 ) 


Therefore 

g{v^) - g{v) + - v\\l - a^'^B{v^ - v) 

<'Vg{v^)~^ {v^ — v) — B{v^ — v) 

=p{v^ - v^-^yG{v - u*) + p{w^ - w^-^yA'^B{v - u*) 

+ -Bv- c|| + \\Aw‘ - Bv‘ - c||i - \\Aw'- - Bv - c|| - - Bv‘ - c||^] 

=f [Ilf - - Ilf - f'llc - l|f‘ - f-‘lia + ^l|o‘ - a-Hi 

+ |[|| Aw*-' -Bv-cWl- II Aw* - Bi. - c||i - II Aw*-' - Bv‘ - c||i]. (12) 


In the above derivation, the first inequality is due to the strong convexity of g{-). The first equality 
employs (fTTl) . The second equality is algebra, and the third equality is due to the relationship of 
a* and in line 5 of Algorithm [TJ 
Finally we have 


— (a* — a)^ {Aw^ — Bv*' — c) 

=- {a* — ay {a* — a*~*) 

P 

=h\y - - ll« - - ll«' - 

Zp 


(13) 
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where the first equality uses the relationship of a* and in line 5 of Algorithm [H and the second 
equality is algebra. 

By adding ([10]), ([l2|), ([T3|), we obtain 

4>{w^) + g{v^) — — g{v) — {Aw — Bv — c) + {Aw^ — Bv^ — c) 


< - -a) + H{w^ - ^)\\l - ^\\v^ - v\\l 

1 


+ 


t-l\\2 
H 


2 ^ — Bv — c||^ — — Bv — c||^ — \\Aw^ — Aw 

+ ^\h-v‘-^\\l-h-v‘\\o-\W-v‘-m\ 

+ |[|| Aui‘-‘ -Bv- c\\l - \\Aw‘ -Bv-c\\l- \\Aw‘-'^ - Bv‘ - c||i] 

+ i|||a-a‘-‘|li-||a-a‘|lil, 

which can be rewritten as the following bound: 

n < - a) + H{w^ - w^-^)\\l - ^\\w^ - w^-^Wjj + ^||a* - a\\l 

'-V-' 

Xt 

— — ||u* — v\\\ + —\\v — V^Wg —-\\v^ — 

2 ^ 2 2 ^ 


Yt 


+ 


pO 


2(1 + 0 ) 




-- 7^\\Aw^~^ -Bv- c\\% - - Bv^ - c||i 

2(1 + 0)" 2" 


Zt 


1 

+ 2 


1 + 0 


IIArc* ^ - Bv - c\\g - \\Aw^ - Bv - c\ 


+ ^[\\v-v^-YG-i^ + 0)\\v-v^\ 


-I 
1 


1 + 0 


\\Aw^ — Bv — c ||2 — IIAtc* — Bv — c \\2 


+ ^[||a-a*-i||i-(l + 0)||a-a‘||i] 

2p 

=A"i + A) — — u* ^ llg + + (1 + 0) ^6t-i — dt- 


We can bound Xt as follows: 

X, = - |||AT(a‘ - a) + H(w‘ - u.‘-‘)||2 - 1||»‘ - + +|a‘ - a||| 

< - |||AT(a> - a) + H{w‘ - - h„, l|fl+ - + f l|a' - ag 

2 2crmax(hf) 2p 


1 

< — max 

2 u 


- 7 ||A’"(q;* - a) +u \\2 - O-max(-H') ^\\u \\2 


-ll 


0 II t Ii2 

+ ~ ®ll2 

ip 
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The last inequality uses the assumption on 0 in the theorem. We also have 
9 


Z, 


\\Aw^-^ - Bv - c\\l + -"2 ^11„||2 




2(1 + 0 ) 

e{l + a^^{H)/p) 


1 + 


^ ' 2(1 + 0)^^'^^ - Bv - c \\2 --\\Aw -Bv -c||2 


\\Aw^-^ - {Bv + c)\\i - \\Aw^-^ - Bv^ - 


< ep{p + a^^,{H)) ^ _ 2 

~2{p-9araUH)) 

where the second inequality uses the fact that 

^(^ + ^) |n,||2 II /||2 ^ 1 + Q 

when 0a < 1 with a = OmaxiH)/p. Therefore 


0||a - u'Wl, 


Y, + Z,<-hv^-v\\l + P^\\v- uIg + + 

2" "2 2" 2 {p-eara^{H)) 


\\B{v^-v)f2 


< 


^ tr^\ , ^(0 + O-max(^)) 

“2 + 


0-max(5)^ 11^^* - W||2 < 0. 


Therefore we obtain 


rt<Xt + Yt + Zt + {l + -5t<{l + - 6t. 

Now by multiplying the above displayed inequality by (1 + 0)*“^, and sum over t = 1,... ,T, we 
obtain dS]). 

In order to obtain Q, we simply note that ([9]) implies that 

-u;* = 0. (14) 


Therefore (flOl) can be replaced by the following inequality: 

(f)*{—AJa^) — (f)*{—AJ a) + (a* — a)^ Aw^ + {—Bv — c)~^ HA{w^~^ — w^) 
YV 4 >* {— A~^ a^)~^ {— A~^ + 24'''q!) + (a* — a)~^ Aw^ 

\T 


= - {wY{A^a^ + H{vu^ - w^-^) - A'^a) + (a* - a)^Aw^ 


— — A' + A' a\\\ + {—Bv — c)~'~HA{vu^ ^ — w^) 


— '^\\A^{a^ — a) + H{w^ — ^)||| + {—Bv — cYHA{w^ ^ — w^) 


= — '^\\A~^ {a^ — a) + H{w^ — w 




+ 


1 

2 1 


WAuf ^ — Bv — c\Y — \\Aw^ — Bv — c\Y — \\Aw^ — Aw 


,.i-l||2 
'H 


(15) 


where the first inequality uses the fact (j)* is 7 strongly convex, which is a direct consequence of the 
fact that is 1/7 smooth. The first equality is due to (fTT)) and the definition of d*. The second 
equality is algebra. 

Now, we note that the right hand side of (USD is the same as that of m- Therefore the 
remaining of the proof follows the same argument as that of ([SD, where we simply use the addition 
of (fTSl) . (fT^ . and (fT3l) to replace the addition of (fTOl) . (fT^ . and (fT^ . This leads to ([SD. 
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B Proof of Corollary 13.2 


We have from ([6]) 

2(1 + 6)^ + g{v'^) — <!)*{—a) — g{v) + [a^)^{Bv + c) — cx^{Bv^ + c) 

<{p + ams,AH))\\Aw^ - Bv - c\\l + p{l + 6')||r;° - v\\}. + ^-^||a - a^\\l 


P 


<2{p + ama^{H))\\Aw° - Bv^ - c||2 


„o 


+ {2{p + CF^^^{H))a^^^{B)‘^ + p{^ + 6*)'7max(G))||t'*^ — fill + 


1 + 


\a — a 


. 0||2 


Now we set a = —{A'^)^'S/(j){A'^{Bv^ + c)) and v = S/g*{B~^a^). This choice achieves the 
maximum value of the left hand side over (a,f). With this choice, and the definition of convex 
conjugate, we obtain 

2(1 + ef[(l){A+{Bv^ + c)) + g{v^) - D{a^)] 

<(2(p + <Tniax(-^))Cmax(^) + p(l + ^)'7max(G) ) ||f — f ||| + 


,.0 „,||2 , 1 + ^ ||„, m||2 


\a — a 


+ 2{p + arQB.:^{H))\\A'uP — BiP — c|||. 
From Corollary I, 1. 11 we obtain 


\-T 


P\\M...T mi2 I 1 + ^||„,T ||2 , Pi^^^)\\„.T ||2 ^(1 + ^) eO 




\a — a*||2 + 


\v — f^llc < 


2 " ~ 2 

Therefore 

\\A{w^ - w^-^)\\l < 2\\A{w^ - f;)||| + 2\\A{w^-^ - w)\\l < 2(2 + 0)(1 + 9)-^S^Jp. 
Moreover, (fT71) also implies Wa^ — a*!!! < p{l + 0) ^(1 + 9) '^6^. Therefore 

I 2 


(16) 

(17) 


Id'^ - a*||2 < ||a'^ - a*||2 + crmax(^)||^(?i^'^ - w'^ ^) 


<cTrm^{H)^2{2 + 9){1 + 9)-T62 /p + ^p{l + 0)-i(l + 0)-^6O. 

It follows from the definition of 62 (') that 

\\v-v^\\l<b2{il + 9r^5^,). 

Similarly, we obtain from (flTl) that ||f^ — < (1 + ^)~^<^2/(p + p(^)- It implies that 

||a - a°||| < bi{{l + 9)~'^6^). 

Now the first desired bound of the theorem can be obtained by plugging in the estimates of ||f — f^||| 
and ||a — a^||| into (fTHI) . 

For the second desired bound, we note from the Jensen’s inequality and ([ 6 ]) that 

*{—A~^a'^) + gipF) — (l)*{—A^ a) — g{v) + {c(^)^{Bv + c) — oi^ {BlF + c) 


— 


E*=i(i + 0) 




^{p + crmax(-H'))Pu;'^ - Bv - c||| + |||f° - v\\g + ^II« - «'^ll 2 


1 


Again we simply take the choice of (a, v) that achieves the maximum on the left hand side: ol = 
-{A+)^V(t){A+{Bv'^ + c)) and v = Vg*{B'^a^). 
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C Proof of Theorem 13.2 


The basic proof structure is the same as that of Theorem 13.11 The fact that minimizes the 
objective function in line 4 of Algorithmic together with the relationship of a* and in line 5, 
implies that 

- w^). (18) 

We thus obtain 

A{w* — rc) + A{w^~^ — w^) 

+ {Aw — Bv — c)~^ H A{w^~^ — w^) 

<SI(l){w^~^y‘ {w^~^ — w) + {a^)~^A{w^ — w) + oi^A{w^~^ — w^) 


+ {Aw — Bv — c) ' HA{w^ ^ — w^) — ■^||V(/>('u;* — V(/>('u;)||| 


1 ^,.t\ 1\ 

= {H{w^~^ — w^) — AJ a^'Y' {w^~^ — tc) + {a*')'' A{w^ — w) + (x' A{w^~^ — w*) 
+ {Aw -Bv- cYhA{w^-^ - w*) - |||A^(a* - a) + H{w^ - w^-^Wl 
= {w^ - w^-Y'^{A~^{a^ - a)) - |||A^(a* - a) + H{w^ - u;‘-^)||i 


+ 


1 


||Arc* ^ — Bv — c||^ — IIArc* — Bv — c||^ + \\w^ — w 


\h 


(19) 


where the derivation uses similar arguments as those of (1101) . The first inequality uses the smooth¬ 
ness of (j), and the first equality uses (fTSl) . The second equality is algebra. 

We also have from the optimality of v^ for minimizing the objective function in line 3 of Algo¬ 
rithm [C and the relationship of and in line 5, to obtain (I12jl . Finally, we can also obtain 

m- 

By adding (fT9]l . (fT^ . (fT3]l . and use the simplihed notation Aw = {w^ — w^ ^), and Aa = — a, 

we obtain 

n < Aw'^{A~^Aa) - |||A^Aa + HAwg + ^||Au;|||^ + ^l|Aa||i 


Xt 


A, 


— ||u* — u ||2 + —\\v — u*||g —-\\v* — ^IIg 

2II 11^ 2 " " 2 


Yt 


+ 2(1 + g) ll^^‘ \Bv + c)Yh + 2(/+g) ll^^* ^ - Bv - c\\l - ^\\Aw^ ^ - Bv* - 


Zt 


+ 


1 + 


\\Aw*-* - {Bv + c)\\% - \\Aw* - {Bv + c 




\G 

+ f - «■“ - '=112 - -Bv- ciiii 

+ i(||a-a‘-‘||i-(l+«)||a-a‘||l]. 
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We can bound Xt as follows: 


X,=- (HAwf(riI - N-‘)(A^Aa) - |(|M’'Aa||^ + ||ifAw||l) + \\\Aw\\'‘„ + ^l|Aa||^ 

Ta II "I' “ II U A„ ,l|2 ^IIaTa^I|2 I ^IIA^I|2 


<(7 - l/(Tmm{H))\\HXw\\2\\A' Xa\\2 - 
< - 


-||FAn;||^-^||A'Aa||^ + -||Aa||^ 


b M'^Aall^ + fllAalli < 0 . 

2amin{H) 2p 


The first inequality uses the assumption that 7 —> 0 in the theorem, and norm inequal¬ 
ities. The second inequality is obtained by taking the maximum over ||i7Arc||2. The last inequality 
uses the assumptions on 6. We also can use the same derivation as that of Theorem 13.11 to show 
that Yt + Zt < 0. Therefore 


rt Y Xt+ Yt — ^\\v^ — + Zt + {1 + 0) — (it < (1 + 0) — 

We can multiply the above by (1 -|- 6Y~^ and then sum over t = 1,... to obtain (|7|). 

Similarly we can prove a dual version of (I19p below. The equation in (|18p and the definition of 
in the theorem imply that 

We thus have 


cJ)*i-A ' d*) - (P*{-A ' a) + {Y - a )' Aw^ + {-Bv - c)' HA{w^-^ - w^) 
<X<j)*{-A~^ay{-A~^a^ + A~^a) - - a)\\l 

+ (a* — a)~^ Aw^ + {—Bv — c)~^ HA{w^~^ — w^) 

={w^-^)^{-{A'^+ H{w^ - + A'^a) + (a* - a)^Aw^ 

— ^11^"''(a* — a) -|- H{w^ — w *~^)\\2 + {—Bv — c)~^HA{w^~^ — w^) 


={w^ - w^-^) ' {A ' (a* - a)) - - a) + H{w* - w 




1 r 


+ - \\Aw* — Bv — c\\jj — \\Aw^ — Bv — c\\g + \\w^ — w' \\jj 


..4-l||2 


( 20 ) 


In the above derivation, the first inequality uses the strong convexity of 4>*, which follows from 
the smoothness of (j). The first equality uses the relationship of V(l)*{—A~^a^) and and the 
relationship of and a*. The last equality uses algebra. Note that the right hand side of (fT^ and 
that of are the same. Therefore by adding ([20]), (fT^ . (fTop . we obtain 


T* < 


+ Yt- 


-+ Zt + {i + -St<{i + - 6t. 


We can multiply (1 -|- OY ^ 


to both sides, and then sum over t = 1,... to obtain ([8|). 
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